82 research outputs found
Towards a geometrical model for polyrepresentation of information objects
The principle of polyrepresentation is one of the
fundamental recent developments in the field of
interactive retrieval. An open problem is how to
define a framework which unifies different as-
pects of polyrepresentation and allows for their
application in several ways. Such a framework
can be of geometrical nature and it may embrace
concepts known from quantum theory. In this
short paper, we discuss by giving examples how
this framework can look like, with a focus on in-
formation objects. We further show how it can be
exploited to find a cognitive overlap of different
representations on the one hand, and to combine
different representations by means of knowledge
augmentation on the other hand. We discuss the
potential that lies within a geometrical frame-
work and motivate its further developmen
Determining the polarity of postings for discussion search
When performing discussion search it might be desirable to consider non-topical measures like the number of positive and negative replies to a posting, for instance as one possible indicator for the trustworthiness of a comment. Systems like POLAR are able to integrate such values into the retrieval function. To automatically detect the polarity of postings, they need to be classified into positive and negative ones w.r.t.\ the comment or document they are annotating. We present a machine learning approach for polarity detection which is based on Support Vector Machines. We discuss and identify appropriate term and context features. Experiments with ZDNet News show that an accuracy of around 79\%-80\% can be achieved for automatically classifying comments according to their polarity
Exploiting information needs and bibliographics for polyrepresentative document clustering
In this paper we explore the potential of combining the principle of polyrepresentation with document clustering. Our idea is discussed and evaluated for polyrepresentation of information needs as wells as for document-based polyrepresentation where bibliographic information is used as representation. The main idea is to present the user with the highly ranked polyrepresentative clusters to support the search process. Our evaluation suggests that our approach is capable of increasing retrieval performance, but performance varies for queries with a high or low number of relevant documents
Multi-facet classification of e-mails in a helpdesk scenario
Helpdesks have to manage a huge amount of
support requests which are usually submitted
via e-mail. In order to be assigned to experts
e ciently, incoming e-mails have to be classi-
ed w. r. t. several facets, in particular topic,
support type and priority. It is desirable to
perform these classi cations automatically.
We report on experiments using Support Vector
Machines and k-Nearest-Neighbours, respectively,
for the given multi-facet classi -
cation task. The challenge is to de ne suitable
features for each facet. Our results suggest
that improvements can be gained for all
facets, and they also reveal which features are
promising for a particular facet
Applying Cross-cultural theory to understand users’ preferences on interactive information retrieval platform design
Presented at EuroHCIR 2014, the 4th European Symposium on Human-Computer Interaction and Information Retrieval, 13th September 2014, at BCS London Office, Covent Garden, London.In this paper we look at using culture to group users and model the users’ preference on cross cultural information retrieval, in order to investigate the relationship between the user search preferences and the user’s cultural background. Initially we review and discuss briefly website localisation. We continue by examining culture and Hofstede’s cultural dimensions. We identified a link between Hofstede’s five dimensions and user experience. We did an analogy for each of the five dimensions and developed six hypotheses from the analogies. These hypotheses were then tested by means of a user study. Whilst the key findings from the study suggest cross cultural theory can be used to model user’s preferences for information retrieval, further work still needs to be done on how cultural dimensions can be applied to inform the search interface design
Combining cognitive and system-oriented approaches for designing IR user interfaces
Poster at the AIR workshop 2008, London, Englan
Scalable DB+IR technology: processing Probabilistic Datalog with HySpirit
Probabilistic Datalog (PDatalog, proposed in 1995) is a probabilistic variant of Datalog and a nice conceptual idea to model Information Retrieval in a logical, rule-based programming paradigm. Making PDatalog work in real-world applications requires more than probabilistic facts and rules, and the semantics associated with the evaluation of the programs. We report in this paper some of the key features of the HySpirit system required to scale the execution of PDatalog programs.
Firstly, there is the requirement to express probability estimation in PDatalog. Secondly, fuzzy-like predicates are required to model vague predicates (e.g. vague match of attributes such as age or price). Thirdly, to handle large data sets there are scalability issues to be addressed, and therefore, HySpirit provides probabilistic relational indexes and parallel and distributed processing. The main contribution of this paper is a consolidated view on the methods of the HySpirit system to make PDatalog applicable in real-scale applications that involve a wide range of requirements typical for data (information) management and analysis
Identifying the relevance of personal values to e-government portals' success: insights from a Delphi study
Most governments around the world have put considerable financial resources into the development of e-government systems. They have been making significant efforts to provide information and services online. However, previous research shows that the rate of adoption and success of e-government systems vary significantly across countries. It is argued here that culture can be an important factor affecting e- government success. This paper aims to explore the relevance of personal values to the e-government success from an individual user’s perspective. The ten basic values identified by Schwartz were used. A Delphi study was carried out with a group of experts to identify the most relevant personal values to the e-government success from an individual’s point of view. The findings suggest that four of the ten values, namely Self-direction, Security, Stimulation, and Tradition, most likely affect the success. The findings provide a basis for developing a comprehensive e-government evaluation framework to be validated using a large scale survey in Saudi Arabia
Preliminary study of technical terminology for the retrieval of scientific book metadata records
Books only represented by brief metadata (book records) are particularly hard to retrieve. One way of improving their retrieval is by extracting retrieval enhancing features from them. This work focusses on scientific (physics) book records. We ask if their technical terminology can be used as a retrieval enhancing feature. A study of 18,443 book records shows a strong correlation between their technical terminology and their likelihood of relevance. Using this finding for retrieval yields >+5% precision and recall gains
A Probabilistic Framework for Information Modelling and Retrieval Based on User Annotations on Digital Objects
Annotations are a means to make critical remarks, to explain and
comment things, to add notes and give opinions, and to relate objects.
Nowadays, they can be found in digital libraries and collaboratories,
for example as a building block for scientific discussion on the one
hand or as private notes on the other. We further find them in product
reviews, scientific databases and many "Web 2.0" applications; even
well-established concepts like emails can be regarded as annotations
in a certain sense. Digital annotations can be (textual) comments,
markings (i.e. highlighted parts) and references to other documents
or document parts. Since annotations convey information which is
potentially important to satisfy a user's information need, this
thesis tries to answer the question of how to exploit annotations for
information retrieval. It gives a first answer to the question if
retrieval effectiveness can be improved with annotations.
A survey of the "annotation universe" reveals some facets of
annotations; for example, they can be content level annotations
(extending the content of the annotation object) or meta level ones
(saying something about the annotated object). Besides the annotations
themselves, other objects created during the process of annotation can
be interesting for retrieval, these being the annotated fragments.
These objects are integrated into an object-oriented model comprising
digital objects such as structured documents and annotations as well
as fragments. In this model, the different relationships among the
various objects are reflected. From this model, the basic data
structure for annotation-based retrieval, the structured annotation
hypertext, is derived.
In order to thoroughly exploit the information contained in structured
annotation hypertexts, a probabilistic, object-oriented logical
framework called POLAR is introduced. In POLAR, structured annotation
hypertexts can be modelled by means of probabilistic propositions and
four-valued logics. POLAR allows for specifying several relationships
among annotations and annotated (sub)parts or fragments. Queries can
be posed to extract the knowledge contained in structured annotation
hypertexts. POLAR supports annotation-based retrieval, i.e. document
and discussion search, by applying an augmentation strategy (knowledge
augmentation, propagating propositions from subcontexts like annotations,
or relevance augmentation, where retrieval status values are propagated)
in conjunction with probabilistic inference, where P(d -> q), the probability
that a document d implies a query q, is estimated.
POLAR's semantics is based on possible worlds and accessibility
relations. It is implemented on top of four-valued probabilistic Datalog.
POLAR's core retrieval functionality, knowledge augmentation with
probabilistic inference, is evaluated for discussion and document
search. The experiments show that all relevant POLAR objects, merged
annotation targets, fragments and content annotations, are able to
increase retrieval effectiveness when used as a context for discussion
or document search. Additional experiments reveal that we can determine
the polarity of annotations with an accuracy of around 80%
- …